{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* $\\hat h_i$ (with subscript $i$) is selected from hypothesis set $\\mathcal{H}_i$ based on training data.\n",
    "\n",
    "$$\\hat h_i = \\arg \\min_{h \\in \\mathcal{H}_i} \\hat \\epsilon_{S_{train}}(h)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* $h_i^*$ is also selected from hypothesis set $\\mathcal{H}_i$ based on the whole data, albeit unknown, from the target distribution ($\\mathcal{X}$, $\\mathcal{y}$). It is NOT an estimate, so no $\\hat \\;$ is used for the symbols. \n",
    "\n",
    "$$h_i^* = \\arg \\min_{h \\in \\mathcal{H}_i} \\epsilon(h)$$\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* $\\hat h$ (without subscript $i$) is selected from $\\hat h_i$s, $\\{\\hat h_1, \\cdots, \\hat h_k\\}$, based on cross-validation data.\n",
    "\n",
    "$$\\hat h = \\arg \\min_{h \\in \\{\\hat h_1, \\cdots, \\hat h_k\\}} \\hat \\epsilon_{S_{cv}}(h)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* data is split into $\\beta$ and $1 - \\beta$ for training and cross-validation, respectively."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "According to Hoeffding's inequality,\n",
    "\n",
    "$$\n",
    "P\\big(\\left|\\epsilon(\\hat h_i) - \\hat \\epsilon_{S_{cv}}(\\hat h_i)\\right| > \\gamma \\big) \\le 2 \\exp{(-2 \\gamma^2 \\beta m)}\n",
    "$$\n",
    "\n",
    "It bounds the error between the real error ($\\epsilon$) and estimated error based on cross-validation data ($\\hat \\epsilon_{cv}$ given the selected hypothesis $\\hat h_i$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With $k$ hypothesis classes,\n",
    "\n",
    "\\begin{align*} \n",
    "P\\big(\\exists \\hat h_i \\in \\{\\hat h_1, \\cdots, \\hat h_k\\}, \\left|\\epsilon(\\hat h_i) - \\hat \\epsilon_{S_{cv}}(\\hat h_i)\\right| > \\gamma \\big)\n",
    "&\\le \\sum_{i=1}^k 2 \\exp{(-2 \\gamma^2 \\beta m)} \\\\\n",
    "&= 2k \\exp{(-2 \\gamma^2 \\beta m)}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Subtracting from 1 at both sides,\n",
    "\n",
    "\\begin{align*} \n",
    "P\\big(\\neg \\exists \\hat h_i \\in \\{\\hat h_1, \\cdots, \\hat h_k\\}, \\left|\\epsilon(\\hat h_i) - \\hat \\epsilon_{S_{cv}}(\\hat h_i)\\right| > \\gamma \\big)\n",
    "&= P\\big(\\forall \\hat h_i \\in \\{\\hat h_1, \\cdots, \\hat h_k\\}, \\left|\\epsilon(\\hat h_i) - \\hat \\epsilon_{S_{cv}}(\\hat h_i)\\right| \\le \\gamma \\big) \\\\\n",
    "&\\ge 1 - 2k \\exp{(-2 \\gamma^2 \\beta m)}\n",
    "\\end{align*}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first equality means that there does not exist a hypothesis with an error larger than $\\gamma$ is equivalent to that all hypotheses have error less or equal than $\\gamma$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setting $2k \\exp{(-2 \\gamma^2 \\beta m)}$ to be $1 - \\frac{\\delta}{2}$, then\n",
    "\n",
    "\\begin{align*} \n",
    "\\gamma = \\sqrt{\\frac{1}{2\\beta m}{\\log{\\frac{4k}{\\delta}}}}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore, with probability at least $1 - \\frac{\\delta}{2}$,\n",
    "\n",
    "$$\n",
    "\\left|\\epsilon(\\hat h_i) - \\hat \\epsilon_{S_{cv}}(\\hat h_i)\\right| \\le \\gamma = \\sqrt{\\frac{1}{2\\beta m}{\\log{\\frac{4k}{\\delta}}}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose\n",
    "\n",
    "\\begin{align*}\n",
    "\\hat h_o &= \\hat h = \\arg \\min_{h \\in \\{\\hat h_1, \\cdots, \\hat h_k\\}} \\hat \\epsilon_{S_{cv}}(h) \\\\\n",
    "\\hat h_j &= \\min_{i=1,\\cdots,k}{\\epsilon(\\hat h_i)}\n",
    "\\end{align*}\n",
    "\n",
    "I thought $\\hat h_o$ is more comprehensible than $\\hat h$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore,\n",
    "\n",
    "\\begin{align*} \n",
    "\\epsilon(\\hat h) = \\epsilon(\\hat h_o)\n",
    "&\\le \\hat \\epsilon_{S_{cv}}(\\hat h_o) + \\gamma \\\\\n",
    "&\\le \\hat \\epsilon_{S_{cv}}(\\hat h_j) + \\gamma \\\\\n",
    "&\\le \\hat \\epsilon(\\hat h_j) + 2\\gamma \\\\\n",
    "&= \\min_{i=1,\\cdots,k}{\\epsilon(\\hat h_i)} + \\sqrt{\\frac{2}{\\beta m}{\\log{\\frac{4k}{\\delta}}}}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Very analogous to that in the lecture note, https://see.stanford.edu/materials/aimlcs229/cs229-notes4.pdf,\n",
    "\n",
    "* the 1st inequality used the fact that $\\left|\\epsilon(\\hat h_i) - \\hat \\epsilon_{S_{cv}}(\\hat h_i)\\right| \\le \\gamma$ (by uniform convergence).\n",
    "* the 2nd inequality: $\\hat h_o$ is found by minimizing $\\hat \\epsilon_{S_{cv}}(h)$ from $\\{\\hat h_1, \\cdots, \\hat h_k\\}$.\n",
    "* the 3rd inequality used uniform convergence again. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
